Why Classification Models Using Array Gene Expression Data Perform So Well: A Preliminary Investigation of Explanatory Factors
نویسندگان
چکیده
Results in the literature of classification models from microarray data often appear to be exceedingly good relative to most other domains of machine learning and clinical diagnostics. Yet array data are noisy, and have very small sample-to-variable ratios. What is the explanation for such exemplary, yet counterintuitive, classification performance? Answering this question has significant implications (a) for the broad acceptance of such models by the medical and biostatistical community, and (b) for gaining valuable insight on the properties of this domain. To address this problem we build several models for three classification tasks in a gene expression array dataset with 12,600 oligonucleotides and 203 patient cases. We then study the effects of: classifier type (kernelbased/non-kernel-based, linear/non-linear), sample size, sample selection within cross-validation, and gene information redundancy. Our analyses show that gene redundancy and classifier choice have the strongest effects on performance. Linear bias in the classifiers, and sample size (as long as kernel classifiers are used) have relatively small effects; train-test sample ratio, and the choice of cross-validation sample selection method appear to have small-to-negligible effects.
منابع مشابه
Global gene expression analysis using microarray to study differential vulnerability to neurodegeneration
Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...
متن کاملGlobal gene expression analysis using microarray to study differential vulnerability to neurodegeneration
Neurodegenerative disorders such as Parkinson’s disease, motor neuron disease and Alzheimer’s disease is characterized by loss of specific cells within certain regions of the brain. One of the most compelling questions is to determine why specific cell populations are vulnerable to neurodegeneration. We addressed this question by studying global gene expression changes using an animal model of ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملThe Effects of Kainic Acid-Induced Seizure on Gene Expression of Brain Neurotransmitter Receptors in Mice Using RT2 PCR Array
Introduction: Kainic acid (KA) induces neuropathological changes in specific regions of the mouse hippocampus comparable to changes seen in patients with chronic temporal lobe epilepsy (TLE). According to different studies, the expression of a number of genes are altered in the adult rat hippocampus after status epilepticus (SE) induced by KA. This study aimed to quantitatively evaluate changes...
متن کاملSTUDY OF HMGA2 GENE INHIBITION WITH SPECIFIC SHRNA AND SIRNA AND INVESTIGATION OF CORRESPONDING EFFECTS ON DOWNSTREAM GENE EXPRESSION IN MDA-MB-231 CANCER CELLS: A BIOINFORMATIC AND EXPERIMENTAL STUDY
Background & Aims: The use of siRNA to silence gene expression is increasingly expanding today. The aim of this study is to bioinformatically and experimentally investigate the inhibition of the HMGA2 gene and its corresponding effects on downstream genes expression rate in MDA-MB-231 cancer cell treated by shRNA and siRNA specific to HMGA2. Materials & Methods: To perform this bioinformatic a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003